Processor cluster architecture and associated parallel processing methods

ABSTRACT

A parallel processing architecture comprising a cluster of embedded processors that share a common code distribution bus. Pages or blocks of code are concurrently loaded into respective program memories of some or all of these processors (typically all processors assigned to a particular task) over the code distribution bus, and are executed in parallel by these processors. A task control processor determines when all of the processors assigned to a particular task have finished executing the current code page, and then loads a new code page (e.g., the next sequential code page within a task) into the program memories of these processors for execution. The processors within the cluster preferably share a common memory (1 per cluster) that is used to receive data inputs from, and to provide data outputs to, a higher level processor. Multiple interconnected clusters may be integrated within a common integrated circuit device.

RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.10/369,182, filed Feb. 18, 2003, which claims the benefit of U.S.Provisional Appl. Nos. 60/358,133 and 60/358,290, both filed on Feb. 19,2002, the disclosures of which are hereby incorporated by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to computer architectures for the parallelprocessing of data.

2. Description of the Related Art

The Multiple-Instruction Multiple Data (MIMD) parallel computer model isa general purpose model which supports different software algorithmsrunning on different processors. If several of the processors areexecuting the same piece of software, there could be unnecessaryduplication of program memory, or congestion related to fetchinginstructions from a common shared memory. Program memory caching is acommon way to help alleviate this situation, but as the number ofprocessors grows, the caching technique may become less effective.Instruction caching is also quite complex, tending to increase siliconarea and processor power consumption. Systems-On-Chip (SOCs) havelimited silicon area, processor speed, and power, and should avoidwasting any of these resources.

Some data manipulation algorithms lead one to consider theSingle-Instruction Multiple-Data (SIMD) parallel computer model. Thismodel assumes that most of the time the same computer instruction can beapplied to many different sets of data in parallel. If this assumptionholds, SIMD represents a very economical parallel computing paradigm.However, some complicated algorithms have many data dependent controlstructures which would require multiple instruction streams for variousperiods of time. Adding to this complication is the possible need tosupport multiple algorithms simultaneously, each of which may operate ona different set of (independent) data. Thus, large amounts of programmemory may be needed.

SUMMARY OF THE INVENTION

The present invention provides a parallel processing architecture thatoffers a high level of performance without the need for large amounts ofredundant program memory. In a preferred embodiment, a plurality ofprocessors are coupled to a common code distribution bus to form aprocessor cluster. The code distribution bus is used to download ordispatch code pages (blocks of code) to the processors forexecution—preferably from a common program memory. The processors arealso coupled to a shared memory that is used to receive inputs(including data sets to be processed) and to provide outputs to otherprocessing entities. A number of interconnected processor clusters maybe integrated within a common integrated circuit device.

Each processor only needs enough program memory to store a single codepage at a time. Typically, each processor has a local program memorythat is between about 1K (1024) and 4K instructions in size. Forprocessors that use four-byte instructions, this results in programmemories of 4K to 16K bytes in size. Although each processor preferablyhas its own respective program memory, two or more processors mayalternatively share a local program memory.

A program or task to be executed by the cluster is initially subdividedinto multiple code pages by selecting appropriate boundary locations.The code page boundaries are preferably selected such that (1) each codepage may be fully loaded into the program memory of one of the cluster'sprocessors, (2) major program loops are fully contained within codepages (so that frequent code page “swaps” are not needed), and (3)execution of the program or task proceeds in a predictable order fromone code page to the next.

The cluster of processors may optionally be subdivided into two or moregroups (“task groups”) for purposes of executing tasks. For example, acluster of eight processors may be subdivided into two four-processortask groups, one of which continuously executes a first task and theother of which continuously executes a second task. The tasks may, forexample, include voice processing algorithms that are continuouslyapplied in real time to voice channels, although other applications arepossible. The processors within a task group execute code pages inparallel, and each such processor typically processes a different set orcollection of data. Task groups may also be formed that includeprocessors from multiple clusters.

In operation according to one embodiment, a task control processorbroadcasts a code page over the code distribution bus to all processorswithin a task group. This code page is stored within the respectiveprogram memories of each of the processors in the task group, and eachsuch processor executes the code page from its respective memory. Onceall processors within the task group have finished executing the codepage, the task control processor broadcasts the next code page of theassigned task to these processors for execution. Execution of the taskproceeds in this manner until all code pages of the task have beenexecuted, at which point the task may be repeated (applied to a new dataset) or terminated. In some cases, a single code page may be adequatefor a complete task, so that once execution starts there is no need toload additional pages.

One aspect of the invention is thus a parallel processing architecturecomprising a plurality of processors that share a common codedistribution bus. Pages or blocks of code are concurrently loaded intorespective program memories of these processors over the codedistribution bus, and are executed in parallel by the processors. Theplurality of processors may, but need not, be a subset of a largercluster of processors that share a common code distribution bus. Theplurality of processors preferably share a common memory (1 per cluster)that is used to receive data inputs and to provide data outputs. A taskcontrol processor preferably detects when all of the plurality ofprocessors have finished executing the code page, and then loads a newcode page (e.g., the next sequential code page within a task) into theprocessors' respective memories for execution.

Another aspect of the invention is a method for subdividing a task (codesequence) into a plurality of code pages to be executed by one or moreprocessors within a cluster (or spread across multiple clusters). Thetask is preferably subdivided into code pages such that the code pagesmay be loaded and executed in an order that is known prior to execution.In addition, each code page is preferably sufficiently small in size tofit within a program memory of a processor of the cluster. Any programloops of the task are preferably fully contained within the code pages,such that execution proceeds sequentially from one code page to the nextwhen the task is executed.

Yet another aspect of the invention is an architecture that supports theability for processors within a cluster to be assigned or allocated totasks to form two or more task groups. Preferably, each processorincludes a task ID register that may be loaded with a task number. Whena code page is broadcast on the code distribution bus in associationwith a particular task, all processors assigned to that task (e.g., allprocessors having the corresponding task number in their respective taskID registers) respond by receiving and executing the code page.

Neither this summary nor the following detailed description section isintended to define the invention. The invention is defined by theclaims.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other features of the invention will now be described withreference to the drawings summarized below. These drawings and theassociated description are of a preferred embodiment of the invention,and are not intended to limit the scope of the invention.

FIG. 1 illustrates a processor cluster architecture according to oneembodiment of the invention;

FIG. 2 illustrates example task and code page transitions during programexecution.

FIG. 3 illustrates example boundary locations for dividing a codesequence into code pages.

FIG. 4 illustrates one possible flow diagram for the Task ControlProcessor of FIG. 1.

FIG. 5 illustrates details of a cluster processor's code bus interfaceaccording to one embodiment of the invention.

FIG. 6 illustrates how multiple processor clusters of the type shown inFIG. 1 may be arranged hierarchically within an integrated circuit.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

FIG. 1 illustrates a parallel processing architecture according to oneembodiment of the invention. This architecture combines some of the bestfeatures of SIMD, MIMD, and multitasking, and takes into account theneed for modularity in System-On-Chip (SOC) products. As will beapparent, the architecture is particularly well suited for processingmultiple instances of independent data streams or sets. Examples ofapplications for which the architecture may be used include voicechannel processing, multi-channel data encryption, and 3G wireless basestation receiver/transmitter functions.

As illustrated, the architecture comprises a cluster 30 of processors 32(P0-P6), each of which has its own local program memory 34, local datamemory 35, and a shared cluster global memory 40. Multiple clusters 30of this type may be provided within a single SOC or other silicondevice. For example, as described below with reference to FIG. 6,multiple clusters 30 may be arranged and interconnected in a hierarchywithin a single integrated circuit device.

The processors 32, which may be referred to as “cluster processors,”preferably access a cluster global memory 40 that is used to communicatewith other entities, such as a host processor 36 or other higher-levelprocessors 38 that control the overall operation of the cluster 30. Theglobal memory's interface is shown with nine ports, seven for thecluster processors P0-P6 and two for input/output (I/O). Although sevencluster processors 32 are shown in this example, a greater or lessernumber (L) of processors 32 (typically between 4 and 12) may be used percluster 30. In addition, the global memory's interface may includeadditional higher-level processors 38.

As illustrated, each cluster processor 32 is coupled to a common codedistribution bus 42. This bus 42 is used to dispatch “code pages,” orshort sequences of code, to individual processors 32 for execution. Thecode page being executed by a given processor 32 is stored within thatprocessor's local program memory 34. Each local program memory istypically only large enough to hold a single code page at a time. Asdescribed below, an important aspect of the design is that a code pagemay be dispatched to (loaded into the local program memories 34 of)multiple cluster processors 32 at one time.

Although each processor 32 is shown as having its own local programmemory 34, a single local program memory 34 may alternatively supplyinstructions to two or more cluster processors 32, thereby conservingeven more silicon. To avoid performance loss, such a memory wouldpreferably deliver instructions at two or more times the normal rate,such as by fetching two or more instructions at a time.

The task of dispatching the code pages to the processors 32 is performedby a task control processor (TCP) 44. The cluster processors 32, thetask control processor 44, the host processor 36, and the higher-levelprocessors 38, may be Pipelined Embedded Processors (PEP™) as describedin Hobson et al, “An Embedded-Processor Architecture for Parallel DSPAlgorithms,” Advanced Signal Processing Algorithms, Architectures, andImplementation Conference, Denver Colo., August 1996, pp. 75-85, thedisclosure of which is incorporated herein by reference. Other types ofmicroprocessors may additionally or alternatively be used.

The task control processor 44 dispatches code pages to the processors 32by reading the code pages from a common program memory 48, andbroadcasting these code pages one-at-a-time on the code distribution bus42. As illustrated in FIG. 1, the code pages are stored in the commonprogram memory 48 in sequences (CP0, CP1, . . . ), where each sequenceof code pages represents a program or task 50 that is executed by thecluster processors 32. In one embodiment, the common program memory 48is located on the same chip as the various processors 32, 36, 38, 44. Inother embodiments, the common program memory may be off chip, or theremay be both on and off chip memories for program code.

A single task control processor 44, code distribution bus 42, and commonprogram memory 48 may be used to dispatch code pages to the clusterprocessors 32 of many different clusters 30, such as all clusters on asingle SOC device (see FIG. 6). Alternatively, one or more additionaltask control processors 44, code distribution buses 42, and commonprogram memories 48 may be provided within a given IC device to reduceloading.

In one embodiment, each processor 32 is suspended while code pages arebeing loaded into its program memory 34. In another embodiment, theprocessors 32 are able to continue execution while code pages are beingloaded into their program memories 34. This may be accomplished bysplitting the program memory 34 into two parts so that the processor canexecute from one part while the other part is being re-loaded. The partsmay be different in size.

A given code page may be simultaneously dispatched over the codedistribution bus 42 to any number of cluster processors 32 forexecution. Specifically, the task control processor 44 may broadcast agiven code page on the code distribution bus 42 to concurrently loadthat code page into the local program memories 34 of some or all of thecluster's processors 32, and possibly into the local program memories ofprocessors 32 within other clusters 30. In the preferred embodiment, acode page transmitted on the code distribution bus 42 will be loadedinto local program memories 42 of all cluster processors 32 assigned tothe corresponding task 50. The group of processors 32 assigned to aparticular task 50 (or possibly to a set of tasks) is referred to as a“task group.” A given task group may, but need not, span (includeprocessors from) multiple clusters 30. As described below, each clusterprocessor 32 may be programmed with an identifier that specifies thetask or task group to which it is currently assigned.

After a code page has been stored in each participating processor'srespective program memory 34, the processors 32 asynchronously executethe code page to completion—typically to process their own respectivedata set or sets. Once all processors 32 within the task group havefinished executing a code page, the task control processor 44 broadcaststhe next sequential code page of the assigned task to these processors32 for execution. Execution of the task proceeds in this manner untilall code pages of the task have been executed, at which point the taskmay be repeated (applied to new data sets) or terminated.

As mentioned above, each code page is typically a subset or block of alarger algorithm or program 50 being executed by a task group. Toefficiently use the illustrated architecture, the program is preferablydivided into code pages so as to achieve the following properties: (1)program loops are permitted as long as they are contained withinrespective code pages; (2) each code page is executed only a few times(and ideally once) per data set, (3) program execution proceedssequentially (in a predictable order) from one code page to the next.Property (1), which may be relaxed in some cases, e.g. by loopunrolling, allows each code page to be executed to completion before itis replaced within the relevant processor's program memory 34. Theoptimum code page size used to subdivide a program 50 is a designparameter that typically varies from program to program.

The size of the local program memories 34 may be selected based on theoptimum memory sizes for the different applications that will beexecuted by the cluster. Since different applications typically havedifferent optimum memory sizes, a compromise may be appropriate in whichthe program memories are selected to be slightly smaller than or equalto the largest optimum memory size.

The sizes of the local data memories 35 may be selected based on theamount of execution state information that will be stored, as well asthe number of independent data sets that will be kept in whole or inpart. The latter depends on how many data sets a processor 32 can managein real-time. Although the illustrated embodiment uses separate localmemories 34, 35 for program code versus data (as is the case for aHarvard architecture), a shared local memory may alternatively be used.

An important benefit of the foregoing method of subdividing the programis that it allows relatively small local program memories 34 to be used(e.g., 1K to 4K instructions, and more typically 1.5K to 2.5Kinstructions; or about 4K to 16K bytes). Specifically, because the codepages are ordinarily executed to completion and in a predeterminedorder, program memories large enough to hold only a single code page ata time may be used without realizing a significant performance penalty.Some processors 32 may optionally have larger program memories 34 thanothers to accommodate relatively large code pages.

By using small program memories, the present architecture captures theessence of SIMD. SIMD architectures are considered “fine grain” bycomputer architects because they have minimal resources but replicatethese resources a large number of times. As mentioned above, thistechnique can be a very effective way to harness the power ofparallelism. The present parallel architecture is efficient for bothmultiple tasks and multiple data sets, but remains as “fine grain” aspossible.

The ability to divide a program into code pages, as set forth above, ispossible for reasons similar to those that enable modern computers tocommonly use program and data caches which exploit the properties oftemporal and spatial locality. Sequential execution of an instructionsequence demonstrates spatial locality, while having loops embeddedwithin a short piece of code demonstrates temporal locality.

Task Control

As described above, each code page generally represents a portion of alarger program or task 50 being executed. FIG. 2 illustrates exampletask boundaries 60 and code page (subtask) boundaries 62 as a group ofcluster processors 32 execute tasks. The markings at the code pageboundaries 62 represent the time needed (not drawn to scale) to load anew code page into the program memories 34 of the group of processors32. If a 200 MHz code distribution bus 42 is used (resulting in twomillion cycles per 10 ms interval), and each code page is 1024 (1K)instruction words long, the number of clock cycles needed to load a codepage is 1024+overhead clock cycles. If overhead is 100%, about 2048clock cycles are needed. Ten blocks (code pages) use up 20,480 cycles orabout 1% of one 10 ms interval.

It is also possible for one processor 32 to operate on severalindependent data sets. This may be accomplished either by reloading allcode pages for each data set, or by structuring each code page toprocess several data sets. The latter usually requires more local datamemory 35 to store intermediate information, so the former is preferredif time permits (a time-space tradeoff). If a set of code pages isrepeated for four data sets, the aforementioned overhead increases toabout 4% of available time. Every application will typically havedifferent parameters, but some small amount of time should be budgetedfor this task swap method (e.g. 4-15%). In the preferred embodiment,data sets are initialized when the power is turned on, and aremaintained by code pages as they execute in real-time. In someembodiments, it may be possible to change the number of data sets as thesystem runs.

One factor to consider is that code swapping for different tasks istypically interleaved over the same code distribution bus 42. Thus, themarkings 62 in FIG. 2 could be shown in different colors, each colorrepresenting a different task. Tasks that are not in code swap mode cancontinue to run. A second factor is that some data manipulationalgorithms take more time than others. Due to these complexities, it maybe desirable to run a simulation for each task mix.

A preferred way to handle the above issues is to use a software taskscheduler and prioritizer 56 (hereinafter “task scheduler”) to keeptrack of the task mix. As depicted in FIG. 1, the task control processor44 may execute the task scheduler 56 from a local memory 57. Each taskmix preferably consists of a fixed or a limited number (e.g., 8) ofdifferent tasks. Tasks are prioritized according to the real-timerequirement of a particular task's data sets. For example, if one dataset has to be revisited every 125 microseconds and another every 1millisecond, the shorter time interval is assigned a higher priority.Under the control of the task scheduler 56, the task control processorsequences through the code pages associated with these tasks in a fixedorder, as shown in FIG. 4. If all code pages have their executionlengths controlled, the timing can be handled by handshaking between thetask control processor and the group of processors 32 that are runningthe particular code page.

In one embodiment, each processor 32 is assigned an address (such as atable row and column position) via hardware so the task controlprocessor 44 can identify and assign specific tasks to individualprocessors 32 (see FIG. 5). Initialization is preferably handled by thetask control processor or through a scan chain interface. Afterinitialization (during which all processors 32 are assigned an initialtask ID), each processor 32 waits for its first code page to be loadedand a run command given. At this point, the task control processorstarts on its first code page, and asks all processors 32 that willexecute that code page if they are ready for a page swap (a time-outmechanism can be implemented to prevent system deadlock, e.g. duringcode debugging). When all of the processors 32 within the relevant taskgroup are ready, the code page is loaded (broadcast) and the processorsare directed to continue. The task control processor then moves on tothe next code page (FIG. 4). Even though each processor in a task groupexecutes a copy of the same code page, they may not all finish quite atthe same time due to data dependent execution paths. The test for codepage completion is thus a synchronizing mechanism in the preferredembodiment.

On special time boundaries, the task control processor may use anindependent time source to check for overall timing. This time sourcemay be associated with another task, and may involve a programmablecounter which is tied to an independent timing source (e.g. a real-timeclock). If the processors 32 are actually halted between code pages, thetask control processor 44 may require access to a processor's runcontrol mechanism.

Task Interface Features

In one embodiment, the code distribution bus structure is such that acommon set of wires can be used either to transfer a command to aspecific cluster processor 32 or to all cluster processors at once. Aseparate wire is used to specify when the bus is in command mode or indata mode. A bus command may consist of 3 fields: a command function, aprocessor address, and command data. Processor addresses can consist ofa cluster address part, and a processor-within-cluster address part(like choosing a row and column in an array). A special address may beused to signal that all processors should be affected. The commandfunction part signals an action to be performed (see below). The commanddata part provides additional information to go with the commandfunction (such as a task ID value). The code distribution bus 42 may bewide enough to contain all of these command fields, and be sufficientlywide to transmit multiple instructions at one time. Each clusterprocessor 32 may have a state machine interface to recognize and executecommands that are addressed to it.

To support the forgoing features, the cluster processors 32, the taskcontrol processor 44, and the code distribution bus 42 preferablyimplement or support the following:

-   -   A command to assign a task number to a selected processor 32        (e.g., to assign that processor to a task group). Each processor        32 may include a programmable task ID register for storing this        number.    -   At least one command for broadcasting instructions to processors        32 that have the same “task number.” Preferably, a “burst”        command is implemented that allows all instructions in a code        page to be broadcast one after the other without stopping (once        the mechanism is started), e.g. through a sequence such as:        write_start, starting_address, instr1, instr2, . . . ,        instr_last, write_stop.    -   A command to determine if all processors 32 in a particular task        group have finished executing the current code page. For        example, a task ID is put on the bus 42, and each processor's        current task ID is compared with the one on the bus. If any        processor 32 detects a match and is not yet ready for a new code        page, that processor forces a common reply signal line (FIG. 5)        to the false state.    -   A command or sequence of commands by which the task control        processor may alter a processor's state from “halted” or        “suspended” to “running”, or from “running” to “halted” or        “suspended”. In the former case, a method is thereby provided to        notify a processor 32 when a code page is ready for execution.        When a processor has finished executing a code page it may be        desirable for it to execute a small piece of “framework” code        that does not need to be reloaded with each code page change.        The framework code maintains proper state between page changes.

The bus 42 may also include one or more commands to facilitate programmemory testing. In one embodiment, test data (acting like a portion of acode page) is broadcast over the bus 42, and participating processors 32(with associated logic) have the contents of their program memories 34compared (word by word) with the broadcast test data, thereby permittingmultiple program memories to be tested in parallel (FIG. 5). Errorreporting can be handled in a manner similar to how code page executioncompletion is handled.

Software applications that are intended to execute on a cluster 30 ofthe type described above preferably have a single entry point, and aredesigned to either exit or to start over on a new set of data uponreaching a logical end point. Some real world applications, such asvoice channel processing, never run out of data, so voice channelprocessing programs commonly process short bursts of voice data and thenstart over on the next burst. Other real world applications that runcontinuously include wireless base station control, multi-media servers,and network switches.

Subdividing Programs into Code Pages

FIG. 3 is a flow diagram of a typical software application showingpossible code page boundaries 70. These boundaries may be selected by adesigner based on an inspection of the compiled application code, or maybe selected by an executable program that selects the boundaries basedon a set of rules. The boundaries are preferably selected such thatevery code page may be loaded in its entirety into the local programmemory 34 of a processor 32. In the example shown in FIG. 3, it isassumed that all of the processors 32 within the relevant cluster 30have equal-size program memories 34 of 2K (2048) instructions (words),or 8K bytes if the instructions are 4 bytes each.

In this example, there are 3 blocks of in-line code and 2 blocks oflooping code (in general there will be many more blocks of each type).The application can be subdivided into code pages of up to 2K words inseveral possible ways, as shown in FIG. 3. The most effectivesubdivision is to have the minimum number of code pages without breakinga significant block of looping code. Each code page preferably executesonly once from beginning to the start-over point. A loop can optionallybe subdivided into two smaller loops or unrolled into straight code if asuitable code page boundary cannot be found. Because the looping codetypically uses up many more processor cycles during execution than isneeded to load the code pages, the loading events do not significantlyimpair performance. This is analogous to the cache miss phenomenon, andresulting performance loss, in a more traditional computer architecture.

As mentioned above, the processors 32 of a cluster 34 may be dividedinto two or more task groups, each of which executes a differentsoftware application or task. In such cases, the task control processorbroadcasts code pages over the bus 42 to all processors 32 within a taskgroup, such that the program memories 34 of these processors are loadedin parallel.

Code pages may be sorted according to the order in which they should beloaded by the task control processor. The code page ordering ispreferably chosen in advance with the aid of software performanceanalysis tools and a task mix simulator, taking into considerationtiming constraints for each task. The resulting task scheduleinformation may be kept as tables (not shown) in task controlprocessor's local memory 57. This schedule information may be loadedfrom the host processor 36, or via another agent such as a boundary scaninterface, during the power up sequence.

Voice Channel Processing

As an example of how the current architecture may be applied to a realapplication, consider how voice channels may be processed. There areseveral standards and proprietary methods for processing voice channels.Let a task consist of a group of standard voice processing algorithms(as programs and code pages). Such a task may, for example, performvoice activity detection, silence suppression, echo cancellation, and/orvoice compression, and may run continuously on each processor 32 withina task group. There can be several such tasks, each of which may differin some respects, such as the amount of voice compression provided orthe length of echo tail processed. Different tasks may be assigned todifferent task groups within the same cluster 30, or spread acrossmultiple clusters.

In the preferred embodiment, a single cluster processor 32 is restrictedto one task at a time, and depending upon the capability of eachprocessor 32, one or more voice channels can be assigned to it (viacommunication with a host processor 36). Voice channel assignment toprocessors is preferably done by prior system analysis and modeling.Small changes can optionally be handled dynamically. Voice data samplesare fed to each processor 32 through the shared cluster global memory 40at regular intervals by a higher-level processor 38. Some intervals maybe as short as a few multiples of the basic telephony sample period of125 microseconds, and some may be 5 to 30 milliseconds, depending uponvarious voice compression algorithms. Code pages are dispatched to theprocessors 32 in a task group in such a way that incoming voice samplesare consumed at their incoming rate. Modified voice sample outputs arealso produced at the same rate, all through the shared cluster globalmemory 40.

Hierarchical Bus Structure

FIG. 6 illustrates one example how multiple clusters 30 of the typedescribed above may be interconnected within an integrated circuitdevice. In this example, the clusters 30 are arranged within a 2-levelhierarchy with one root cluster 30 and J leaf clusters. The root clustercontains K processors 32, and each leaf cluster contains L processors32.

All processors 32 of all clusters 30 in this example are coupled to acommon code distribution bus 42, and are managed (receive code pages,etc.) by a common task control processor 44. Multiple code distributionbus/task control processor pairs may alternately be provided, each ofwhich services a different cluster 30 or group of clusters.

Each cluster 30 in the hierarchy includes a bus cycle unit 60 thatcontrols accesses to that cluster's shared memory 40. The shared memory40 of the root cluster may be omitted. Each leaf level processor 32 canonly access the shared memory 40 within its respective cluster 30, andnot other shared memories, in the preferred embodiment. Each root levelprocessor, however, can access the shared memory 40 of the root cluster30 (if provided) and the shared memories 40 of the leaf clusters.

Each bus cycle assignment unit 60 controls memory accesses by allocatingtimeslots on its respective bus 62 to specific processors 32. Thetimeslots may be allocated to processors according to a round robinprotocol, although other types of assignment protocols may alternativelybe used. Each leaf-level bus cycle assignment unit 60 allocatestimeslots to L leaf processors 32 as well as processing requests fromthe K root processors 32, whose time slots are allocated by the rootlevel bus cycle assignment unit. With this hierarchical arrangement, theroot-level processors 32 may, for example, be used primarily to loadinput datasets into, and to read output datasets from, the sharedmemories 40 of the leaf clusters. The processors 32 of the leaf clusters30, on the other hand, may execute one or more signal processing tasks(echo cancellation, voice compression, etc.).

The preferred number of processors (L) in a leaf cluster 30 depends onhow much contention can be tolerated in the global memories 40. For aregular array, L might be about the square root of N, where N is thetotal number of processors 32 in the leaf clusters. There is also adependence on the lengths of the code distribution bus 32 and memorydata and address busses. These buses should be short enough (i.e.,lightly loaded) to support single clock cycle data transfer. However,multiple clocks per transfer are also possible. Typical ranges for J, K,and L are as follows L=4-10; K=3-6, and J=4-10.

Additional details of the hierarchical bus structure depicted in FIG. 6are disclosed in a concurrently-filed U.S. patent application by Hobsonet al. titled HIERARCHICAL BUS STRUCTURE AND MEMORY ACCESS PROTOCOL FORMULTIPROCESSOR COMMUNICATIONS (application Ser. No. 10/369,340, filedFeb. 18, 2003), and by corresponding U.S. provisional application No.60/358,133, filed, Feb. 19, 2002, the disclosures of which are herebyincorporated by reference.

Although this invention has been described in terms of certain preferredembodiments and applications, other embodiments and applications thatare apparent to those of ordinary skill in the art, includingembodiments and applications which do not provide all of the featuresand advantages set forth herein, are also within the scope of thisinvention. Accordingly, the scope of the present invention is intendedto be defined only by reference to the appended claims.

1-35. (canceled)
 36. A method of executing a task in parallel on aplurality of processors, comprising: (a) broadcasting on a codedistribution bus a current code page of a sequence of code pages for thetask to concurrently load the current code page into a plurality oflocal memories of the plurality of processors; (b) executing by theplurality of processors each code page of the sequence broadcast to theprocessors; (c) monitoring the plurality of processors to determinewhether all of the processors have completed execution of the currentcode page from their corresponding local memories; and (d) after all ofthe processors have finished executing the current code page, repeating(a)-(c) with a next sequential code page of the sequence treated as thecurrent code page, at least until all code pages of the sequence havebeen executed.
 37. The method according to claim 36, further comprisingstoring the sequence of code pages of the task in a common programmemory.
 38. The method according to claim 36, further comprisingprocessing a different dataset by each respective processor.
 39. Themethod according to claim 36, wherein the task is a voice-processingtask, and wherein the method further comprises processing a plurality ofvoice-data streams in parallel with the plurality of processors.
 40. Themethod according to claim 36, wherein each processor executes thecurrent code page from a respectively different local memory.
 41. Themethod according to claim 36, wherein each local memory is shared by twoprocessors.
 42. The method according to claim 36, wherein each code pageof the sequence has a size of less than about 4K instruction words. 43.The method according to claim 36, wherein broadcasting a current codepage in step (a) comprises alternating broadcasts of code pages of thetask to a first plurality of processors and code bases of a second taskto a second plurality of processors.
 44. The method according to claim36, wherein the method is performed entirely within an integratedcircuit.
 45. The method according to claim 36, wherein steps (a)-(c) areperformed by a task-control processor that executes a control program.46. A method of executing a task in parallel on a plurality ofprocessors, comprising: (a) broadcasting on a code distribution bus acurrent code page of the sequence of code pages to concurrently load thecurrent code page into a plurality of local memories of the plurality ofprocessors; (b) executing in parallel by the plurality of processorseach code page of the sequence broadcast to the processors, eachprocessor following an execution path in the current code page that isdependent upon a data set processed by that processor; (c) monitoringthe plurality of processors asynchronously execute the current code pageto determine whether all of the processors have completed execution ofthe current code page from their corresponding local memories; and (d)after all of the processors have finished executing the current codepage, repeating (a)-(c) with a next sequential code page of the sequencetreated as the current code page, at least until all code pages of thesequence have been executed.
 47. The method according to claim 46,further comprising storing the sequence of code pages of the task in acommon program memory.
 48. The method according to claim 46, furthercomprising processing a different dataset by each respective processor.49. The method according to claim 46, wherein the task is avoice-processing task, and wherein the method further comprisesprocessing a plurality of voice-data streams in parallel with theplurality of processors.
 50. The method according to claim 46, whereineach processor executes the current code page from a respectivelydifferent local memory.
 51. The method according to claim 46, whereineach local memory is shared by two processors.
 52. The method accordingto claim 46, wherein each code page of the sequence has a size of lessthan about 4K instruction words.
 53. The method according to claim 46,wherein broadcasting a current code page in step (a) comprisesalternating broadcasts of code pages of the task to a first plurality ofprocessors and code bases of a second task to a second plurality ofprocessors.
 54. The method according to claim 46, wherein the method isperformed entirely within an integrated circuit.
 55. The methodaccording to claim 46, wherein steps (a)-(c) are performed by atask-control processor that executes a control program.
 56. The methodaccording to claim 46, wherein at least one code page comprises aprogram loop.
 57. The method according to claim 46, further comprising aplurality of data sets, at least two data sets being different from eachother, and wherein each processor processes a respectively differentdata set of a plurality of data sets using all of the code pages in thesequence.
 58. The method according to claim 46, wherein the plurality ofprocessors collectively control a state of a common signal line toindicate their completion status of the current code page, and whereinstep (c) further comprises monitoring the common signal line.
 59. Themethod according to claim 46, wherein, by executing the code pages, eachof the processors respectively processes voice data of a voice channelin real time.
 60. A parallel processing system, comprising: a pluralityof processors coupled to a common code distribution bus, each processorcomprising a local memory capable of storing executable code receivedfrom the common code distribution bus and each processor being capableof following an execution path that is different from an execution pathof another processor of the plurality of processors during asynchronousparallel execution of a like code page; and a task-control processorcapable of concurrently loading the local memories of the processorswith like code pages over the common code distribution bus.
 61. Theparallel processing system according to claim 60, wherein thetask-control processor is capable of monitoring like code page executionby the plurality of processors, and capable of loading a next like codepage into the respective local memories of the processors in response toall processors completing execution of a current like code page.
 62. Theparallel processing system according to claim 60, wherein thetask-control processor is capable of loading only one like code page ata time into the respective local memories of the processors.
 63. Theparallel processing system according to claim 60, wherein the pluralityof processors and the task-control processor are part of a commonintegrated circuit device.
 64. The parallel processing system accordingto claim 60, wherein the plurality of processors is a portion of anotherplurality of processors included within a common integrated circuitdevice.
 65. The parallel processing system according to claim 60,further comprising a common memory that is shared by the plurality ofprocessors, wherein the common memory is used by the processors toreceive data inputs and to provide data outputs.
 66. The parallelprocessing system according to claim 60, wherein each processor iscapable of processing a respectively different voice-channel data streamin real time during parallel execution of the like code pages.
 67. Theparallel processing system according to claim 60, wherein the pluralityof processors form one task group of a plurality of task groups that arecoupled to the common code distribution bus, and wherein thetask-control processor is capable of alternating code-page loadingevents between the plurality of task groups over the code distributionbus.
 68. The parallel processing system according to claim 67, whereineach processor comprises a programmable register capable of storing anidentifier associated with a particular task group.
 69. The parallelprocessing system according to claim 67, wherein each task groupprocesses a respectively different voice-data stream in real time duringparallel execution of a current code page, and wherein each processor ofa task group follows an execution path in the current code page that isdependent upon the voice-data stream being processed.
 70. The parallelprocessing system according to claim 67, wherein the task-controlprocessor is capable of loading the respective local memories of theprocessors from a common program memory that is capable of storing asequence of code pages for a task.
 71. The parallel processing systemaccording to claim 60, wherein each local memory has an instruction wordsize of between about 1K to about 4K.
 72. The parallel processing systemaccording to claim 60, wherein the plurality of processors is capable ofcollectively controlling a state of a common signal line that indicatestheir completion status of a current code page, and wherein thetask-control processor is capable of monitoring the common signal lineto determine whether all of the processors have finished executing thecurrent code page.
 73. The parallel processing system according to claim60, wherein at least one of the code pages comprises a program loop. 74.A processor cluster, comprising: a task-control processor capable ofbroadcasting code pages over a code distribution bus, the code pagesbeing associated with a predetermined task-group identifier; a pluralityof processors, each processor being coupled to the code distribution busand being associated with a predetermined task-group identifier, eachprocessor comprising a local memory from which the processor executescode, each respective local memory concurrently receiving code pagesassociated with the predetermined task-group identifier with which thecorresponding processor is associated, and each processor capable ofasynchronously executing in parallel each code page in the memory of theprocessor under the control of the task-control processor.
 75. Theprocessor cluster according to claim 74, wherein the task-controlprocessor is capable of monitoring code page execution by the pluralityof processors and, in response to all processors associated with apredetermined task-group identifier, capable of loading a next code pageinto the respective local memories of the processors associated with thepredetermined task-group identifier.
 76. The processor cluster accordingto claim 74, wherein the task-control processor is capable of loadingonly one code page at a time into the respective local memories.
 77. Theprocessor cluster according to claim 74, wherein the plurality ofprocessors and the task-control processor are part of a commonintegrated circuit device.
 78. The processor cluster according to claim74, wherein a number of processors forming the plurality of processorsis between about 4 and about 10, inclusive.
 79. A processor clusteraccording to claim 74, wherein each processor associated with a selectedpredetermined task-group identifier is capable of processing in realtime a respectively different voice-channel data stream.
 80. A processorcluster according to claim 79, wherein at least two processorsassociated with the selected task-group identifier follow differentexecution paths as the two processors asynchronously execute in parallela current code page, each execution path being dependent upon therespective voice-channel data stream processed by the particularprocessor.
 81. A processor cluster according to claim 74, wherein theprocessors associated with a selected task-group identifier are capableof collectively controlling a state of a common signal line to indicatetheir corresponding completion status of a current code page, andwherein the task-control processor is capable of monitoring the commonsignal line to determine whether all processors associated with theselected task-group identifier have completed executing the current codepage.
 82. A processor cluster according to claim 74, wherein at leastone code page comprises a program loop.
 83. A parallel processingsystem, comprising: a plurality of local program memories; a pluralityof processors coupled to a common code:-distribution bus, each clusterprocessor being associated with a common task, each processor capable ofasynchronously executing code from the local program memory in parallelwith other processors of the plurality of processors; and a task-controlprocessor coupled to a code distribution bus, the task-control processorbeing capable of broadcasting a code page of a sequence of code pages onthe code-distribution bus for concurrent loading into the respectivelocal program memories, and capable of monitoring a completion status ofeach of the processors as the processors asynchronously execute acurrent code page.
 84. A parallel processing system according to claim83, further comprising a common program memory capable of storing thesequence of code pages.
 85. A parallel processing system according toclaim 83, wherein each processor is associated with a selectedtask-group identifier, the parallel processing system further comprisinga common program memory capable of storing code pages, and wherein thecode pages are each associated with at least one task-group identifier.86. A parallel processing system according to claim 83, wherein thetask-control processor is capable of monitoring execution of a currentcode page by the plurality of processors, and is capable of loading anext code page into the local program memories in response to allprocessors completing execution of the current code page.
 87. A parallelprocessing system according to claim 83, wherein the task-controlprocessor is capable of loading only one code page at a time into therespective local program memories.
 88. A parallel processing systemaccording to claim 83, wherein each respective local program memory isassociated with one processor.
 89. A parallel processing systemaccording to claim 83, wherein each respective local program memory isassociated with two processors.
 90. A parallel processing systemaccording to claim 83, wherein each local program memory comprises aninstruction word size of about 1K to about 4K, inclusive.
 91. A parallelprocessing system according to claim 83, wherein the plurality ofprocessors form one task group of a plurality of task groups that arecoupled to the common code distribution bus, and wherein thetask-control processor is capable of alternating code-page loadingevents between the plurality of task groups over the code distributionbus.
 92. A parallel processing system according to claim 83, wherein theplurality of processors, the plurality of local program memories, andthe task-control processor are integrated within a common integratedcircuit device.
 93. A parallel processing system according to claim 83,further comprising a common memory capable of being shared by theplurality of processors, and wherein the processors use the commonmemory for receiving data inputs and for providing data outputs.
 94. Aparallel processing system according to claim 83, wherein each processorprocesses a respectively different data set during parallel execution ofa like code page.
 95. A parallel processing system according to claim83, wherein each processor is capable of processing a respectivelydifferent voice-channel data stream in real time.
 96. A parallelprocessing system according to claim 83, wherein each processor iscapable of following an execution path that is different from anexecution path followed by another processor during parallel executionof a current code page.
 97. A parallel processing system according toclaim 83, wherein at least one code page comprises a program loop.
 98. Aparallel processing system according to claim 83, wherein the processorsare capable of collectively controlling a state of a common signal lineto indicate a completion status of a current code page, and wherein thetask-control processor is capable of monitoring the common signal lineto determine whether all processors have finished executing the currentcode page.
 99. A parallel processing system, comprising: a plurality ofprocessor means coupled to a common code distribution bus, eachprocessor means comprising a local memory means for storing executablecode received from the common code distribution bus and each processormeans for executing a code page stored in the local memory means andfollowing an execution path that is different from an execution path ofanother processor means of the plurality of processor means duringasynchronous parallel execution of a like code page; and a task-controlprocessor means for concurrently loading the local memory means of theprocessor means with like code pages over the common code distributionbus.
 100. The parallel processing system according to claim 99, whereinthe task-control processor means is further for monitoring like codepage execution by the plurality of processor means, and for loading anext like code page into the respective local memory means of theprocessor means in response to all processor means completing executionof a current like code page.
 101. The parallel processing systemaccording to claim 99, wherein the task-control processor means isfurther for loading only one like code page at a time into therespective local memory means of the processor means.
 102. The parallelprocessing system according to claim 99, wherein the plurality ofprocessor means and the task-control processor means are part of acommon integrated circuit device.
 103. The parallel processing systemaccording to claim 99, wherein the plurality of processor means are aportion of another plurality of processor means included within a commonintegrated circuit device.
 104. The parallel processing system accordingto claim 99, further comprising a common memory that is shared by theplurality of processor means, wherein the common memory is used by theprocessor means to receive data inputs and to provide data outputs. 105.The parallel processing system according to claim 99, wherein eachprocessor mean is further for processing a respectively differentvoice-channel data stream in real time during parallel execution of thelike code pages.
 106. The parallel processing system according to claim99, wherein the plurality of processor means form one task group of aplurality of task groups that are coupled to the common codedistribution bus, and wherein the task-control processor means isfurther for alternating code-page loading events between the pluralityof task groups over the code distribution bus.
 107. The parallelprocessing system according to claim 106, wherein each processor meanscomprises a programmable register means for storing an identifierassociated with a particular task group.
 108. The parallel processingsystem according to claim 106, wherein each task group processes arespectively different voice-data stream in real time during parallelexecution of a current code page, and wherein each processor means of atask group follows an execution path in the current code page that isdependent upon the voice-data stream being processed.
 109. The parallelprocessing system according to claim 106, wherein the task-controlprocessor means is further for loading the respective local memory meansof the processor means from a common program memory that is capable ofstoring a sequence of code pages for a task.
 110. The parallelprocessing system according to claim 99, wherein each local memory hasan instruction word size of between about 1K to about 4K.
 111. Theparallel processing system according to claim 99, wherein the pluralityof processor means is capable of collectively controlling a state of acommon signal line that indicates their completion status of a currentcode page, and wherein the task-control processors is further formonitoring the common signal line to determine whether all of theprocessor means have finished executing the current code page.
 112. Theparallel processing system according to claim 99, wherein at least onecode page comprises a program loop.
 113. A processor cluster,comprising: a task-control processor means for broadcasting code pagesover a code distribution bus, the code pages being associated with apredetermined task-group identifier; a plurality of processor means,each processor means being coupled to the code distribution bus andbeing associated with a predetermined task-group identifier, eachprocessor means comprising a local memory from which the processorexecutes code, each respective local memory concurrently receiving codepages associated with the predetermined task-group identifier with whichthe corresponding processor means is associated, and each processormeans for asynchronously executing in parallel each code page in thememory of the processor means under the control of the task-controlprocessor means.
 114. The processor cluster according to claim 113,wherein the task-control processor means is further for monitoring codepage execution by the plurality of processor means and, in response toall processor means associated with a predetermined task-groupidentifier, for loading a next code page into the respective localmemories of the processor means associated with the predeterminedtask-group identifier.
 115. The processor cluster according to claim113, wherein the task-control processor means is capable of loading onlyone code page at a time into the respective local memories.
 116. Theprocessor cluster according to claim 113, wherein the plurality ofprocessor means and the task-control processor means are part of acommon integrated circuit device.
 117. The processor cluster accordingto claim 113, wherein a number of processor means forming the pluralityof processor means is between about 4 and about 10, inclusive.
 118. Aprocessor cluster according to claim 113, wherein each processor meansassociated with a selected predetermined task-group identifier isfurther for processing in real time a respectively differentvoice-channel data stream.
 119. A processor cluster according to claim118, wherein at least two processor means associated with the selectedtask-group identifier follow different execution paths as the twoprocessor means asynchronously execute in parallel a current code page,each execution path being dependent upon the respective voice-channeldata stream processed by the particular processor means.
 120. Aprocessor cluster according to claim 113, wherein the processor meansassociated with a selected task-group identifier are further forcollectively controlling a state of a common signal line to indicatetheir corresponding completion status of a current code page, andwherein the task-control processor means is further for monitoring thecommon signal line to determine whether all processor means associatedwith the selected task-group identifier have completed executing thecurrent code page.
 121. A processor cluster according to claim 113,wherein at least one code page comprises a program loop.
 122. A parallelprocessing system, comprising: a plurality of local program memories; aplurality of processor means coupled to a common code-distribution bus,each processor means being associated with a common task, each processormeans for asynchronously executing code from the local program memory inparallel with other processors of the plurality of processors; and atask-control processor means coupled to a code distribution bus, thetask-control processor means for broadcasting a code page of a sequenceof code pages on the code-distribution bus for concurrent loading intothe respective local program memories, and for monitoring a completionstatus of each of the processor means as the processor meansasynchronously execute a current code page.
 123. A parallel processingsystem according to claim 122, further comprising a common programmemory means for storing the sequence of code pages.
 124. A parallelprocessing system according to claim 122, wherein each processor meansis associated with a selected task-group identifier, the parallelprocessing system further comprising a common program memory means forstoring code pages, and wherein the code pages are each associated withat least one task-group identifier.
 125. A parallel processing systemaccording to claim 122, wherein the task-control processor means isfurther for monitoring execution of a current code page by the pluralityof processor means, and for loading a next code page into the localprogram memories in response to all processor means completing executionof the current code page.
 126. A parallel processing system according toclaim 122, wherein the task-control processor means is further forloading only one code page at a time into the respective local programmemories.
 127. A parallel processing system according to claim 122,wherein each respective local program memory is associated with oneprocessor means.
 128. A parallel processing system according to claim122, wherein each respective local program memory is associated with twoprocessor means.
 129. A parallel processing system according to claim122, wherein each local program memory comprises an instruction wordsize of about 1K to about 4K, inclusive.
 130. A parallel processingsystem according to claim 122, wherein the plurality of processor meansform one task group of a plurality of task groups that are coupled tothe common code distribution bus, and wherein the task-control processormean is further for alternating code-page loading events between theplurality of task groups over the code distribution bus.
 131. A parallelprocessing system according to claim 122, wherein the plurality ofprocessor means, the plurality of local program memories, and thetask-control processor means are integrated within a common integratedcircuit device.
 132. A parallel processing system according to claim122, further comprising a common memory capable of being shared by theplurality of processor means, and wherein the processor means use thecommon memory for receiving data inputs and for providing data outputs.133. A parallel processing system according to claim 122, wherein eachprocessor means processes a respectively different data set duringparallel execution of a like code page.
 134. A parallel processingsystem according to claim 122, wherein each processor means is furtherfor processing a respectively different voice-channel data stream inreal time.
 135. A parallel processing system according to claim 122,wherein each processor means is further for following an execution paththat is different from an execution path followed by another processormeans during parallel execution of a current code page.
 136. A parallelprocessing system according to claim 122, wherein at least one code pagecomprises a program loop.
 137. A parallel processing system according toclaim 122, wherein the processor means are further for collectivelycontrolling a state of a common signal line to indicate a completionstatus of a current code page, and wherein the task-control processormeans is further for monitoring the common signal line to determinewhether all processors have finished executing the current code page.138. An article comprising: a computer readable medium having storedthereon instructions that, if executed, result in at least thefollowing: (a) broadcasting on a code distribution bus a current codepage of a sequence of code pages for the task to concurrently load thecurrent code page into a plurality of local memories of a plurality ofprocessors; (b) executing by the plurality of processors each code pageof the sequence broadcast to the processors; (c) monitoring theplurality of processors to determine whether all of the processors havecompleted execution of the current code page from their correspondinglocal memories; and (d) after all of the processors have finishedexecuting the current code page, repeating (a)-(c) with a nextsequential code page of the sequence treated as the current code page,at least until all code pages of the sequence have been executed. 139.The article according to claim 138, further comprising storing thesequence of code pages of the task in a common program memory.
 140. Thearticle according to claim 138, further comprising processing adifferent dataset by each respective processor.
 141. The articleaccording to claim 138, wherein the task is a voice-processing task, andwherein a plurality of voice-data streams are processed in parallel bythe plurality of processors.
 142. The article according to claim 138,wherein each processor executes the current code page from arespectively different local memory.
 143. The article according to claim138, wherein each local memory is shared by two processors.
 144. Thearticle according to claim 138, wherein each code page of the sequencehas a size of less than about 4K instruction words.
 145. The articleaccording to claim 138, wherein broadcasting a current code page in step(a) comprises alternating broadcasts of code pages of the task to afirst plurality of processors and code bases of a second task to asecond plurality of processors.
 146. The article according to claim 138,wherein steps (a)-(c) are performed by a task-control processor thatexecutes a control program.
 147. An article comprising: a computerreadable medium having stored thereon instructions that, if executed,result in: (a) broadcasting on a code distribution bus a current codepage of the sequence of code pages to concurrently load the current codepage into a plurality of local memories of a plurality of processors;(b) executing in parallel by the plurality of processors each code pageof the sequence broadcast to the processors, each processor following anexecution path in the current code page that is dependent upon a dataset processed by that processor; (c) monitoring the plurality ofprocessors asynchronously execute the current code page to determinewhether all of the processors have completed execution of the currentcode page from their corresponding local memories; and (d) after all ofthe processors have finished executing the current code page, repeating(a)-(c) with a next sequential code page of the sequence treated as thecurrent code page, at least until all code pages of the sequence havebeen executed.
 148. The article according to claim 147, furthercomprising storing the sequence of code pages of the task in a commonprogram memory.
 149. The article according to claim 147, furthercomprising processing a different dataset by each respective processor.150. The article according to claim 147, wherein the task is avoice-processing task, and wherein a plurality of voice data streams areprocessed in parallel by the plurality of processors.
 151. The articleaccording to claim 147, wherein each processor executes the current codepage from a respectively different local memory.
 152. The articleaccording to claim 147, wherein each local memory is shared by twoprocessors.
 153. The article according to claim 147, wherein each codepage of the sequence has a size of less than about 4K instruction words.154. The article according to claim 147, wherein broadcasting a currentcode page in step (a) comprises alternating broadcasts of code pages ofthe task to a first plurality of processors and code bases of a secondtask to a second plurality of processors.
 155. The article according toclaim 147, wherein steps (a)-(c) are performed by a task-controlprocessor that executes a control program.
 156. The article according toclaim 147, wherein at least one code page comprises a program loop. 157.The article according to claim 147, further comprising a plurality ofdata sets, at least two data sets being different from each other, andwherein each processor processes a respectively different data set of aplurality of data sets using all of the code pages in the sequence. 158.The article according to claim 147, wherein the plurality of processorscollectively control a state of a common signal line to indicate theircompletion status of the current code page, and wherein step (c) furthercomprises monitoring the common signal line.
 159. The article accordingto claim 147, wherein, by executing the code pages, each of theprocessors respectively processes voice data of a voice channel in realtime.